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Errors are deliberately introduced in the output of a binary message 
source to reduce the entropy rate. The errors depend on the source 
sequence in a deterministic shift-invariant manner. The tradeoff be- 
tween error rate permitted and reduction of entropy rate is of interest. 
It is shown that the ideal bound cannot be attained. If the errors are 
required to be produced causally, then a bound stronger than the ideal 
bound takes over. The quantities of interest are found explicitly for the 
example: change all O's in 0-runs of length 1 to l's. 

If a transmission channel has capacity C bits/second and a message 
source has entropy rate H bits/second satisfying H ^ C, then the source 
can be encoded, fed to the channel, decoded at the channel output, and 
recovered essentially without error after such handling. The rate-dis- 
tortion theory is concerned with the case where H > C; we try to mini- 
mize some measure of the errors that are necessarily present. 1 

We treat here a special class of systems in which the errors are delib- 
erately introduced before submission to the channel to reduce the en- 
tropy rate to that of the channel; the mutilated source is then handled 
without further error by the channel. The usual treatment involves use 
of block codes, but we will examine the more interesting sliding (or 
shift-invariant) codes. 

The source in Fig. 1 emits letters x n , — « < n < °°, at rate 1 per unit 
time. The letters are drawn from alphabet A = {0,1} with probability 
distribution P\x n = 0} = P\x n = 1} = 1/2, the same for all n, and the draws 
are statistically independent. We denote by x = (x n : — °° < n < °°) a 
sample sequence of the source process X. The entropy rate of the source 
is H{X) = 1 bit per unit time. 

The error generator operates on a source sequence x to produce a se- 
quence e = (e n : — °° < n < °°) of A valued random variables e n = e n (x). 
The error at time n is a deterministic function e n = 
t)(--- ,x n -\;x n ;x n +i, • • •) of the whole sample sequence x. The measurable 
function rj is the same for all n, so that the dependence of e on x is shift 
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Fig. 1 — Reducing the entropy rate by introducing errors. 

invariant: if sequence x is shifted m places, the sequence e = e(x) shifts 
m places with it. 

The output of the adder box "©" in Fig. 1 is simply y n = x n © e n , with 
© the usual addition mod 2. We regard the output process Y as X cor- 
rupted by the errors E. Now, depending on how E is generated, process 
y can have entropy rate H(Y) < H(X), and so can be handled by a 
channel of correspondingly smaller capacity at the price of the errors 
introduced. We are concerned with the tradeoff between the error rate 
and the decrease in entropy rate. Explicitly, suppose error rate e is 
specified, ^ € ^ 1/2, and that ij(« ••;;•••) is a stationary error-generating 
function with the property P{e n = 1} = c. The resulting Y process will 
have a certain entropy rate H(Y) ^ H(X) determined by 77. What is the 
least value that H( Y) can have for all such 17? 

I. THE IDEAL BOUND 

Let us consider the joint process Z = (Y,E), where the Z alphabet is 
{(0,0),(0,1),(1,0),(1,1)} and each z n in a sequence z = (z n : -« < n < °°) 
is the pair z n = (y n ,e n )- The mapping 0: X -* Z, which sends a sample 
sequence x to sequence 2 = Qx, is obviously shift invariant. The map 
is also measure preserving by definition; the probability measure on the 
space of sequences z is that induced by and the X distribution. In the 
other direction, x n = y n © e n , — °° < n < « is the inverse map <i>: Z -*■ X, 
which recovers the source sequence x if the compressed version y and 
the errors e are known. This map is also shift invariant and measure 
preserving. Since processes X and Z are isomorphic in the above sense, 
their entropy rates are the same: H(Z) = H(X) = 1. 

From the general theory (Section 6 in Ref. 3), the entropy rate H(Z) 
is the average conditional entropy 

H(Z) = H( Zl \---,zo) 

= H[(yi,ei)\---,(y ,e )] 

of letter z 1, given the preceding letters • • • ,zo- Using the addition law for 
conditional entropy, we find 
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H(Z) = if [ei|. • • ,(y ,e )] + H\y\V ' ' >iyo,e ) and ej 
gH(e 1 )+if(y 1 |...,y ) 

= /l(6)+if(Y), 

since if (d) = h(c) = c log 2 (lA) + (1 - e) log 2 [l/(l - e)] when P\e x = 1} 
= e, P|ei = 0} = 1 - e. Using H(Z) = 1, we have the lower bound H(Y) ^ 
1 — h(e), ^ e ^ 1/2, for any such compression scheme. 
Our first result is: 

Theorem 1 : For error rate < e < 1/2 it is not possible to attain the 
bound H(Y) = 1 - h(e). 

Proof: For each fixed N ^ 1, there holds NH(Z) = if (zi, • • • ,zn I ■ • • ,zo), 
by induction from 

H(zu • • ',zn\- • • ,zo) = iffeil- • • ,zo) + ff(z2, • • • ,zn\- • • ,z\). 
Arguing as before, we find 

N = H[(y i,ei), • • • ,(yN,e/v)|* ■ • ,iyo,eo)] 

= H\yi,--,y N \--,(y ,e )] 

+ H[e h • • • ,e N \> • • ,(yo,e ) and y b • • ■ ,y N ]; 

moreover, 

H\y h • • • ,yyv I • • • , (yo,e )] ^ H(y h • • • ,y N | • • • ,y ) 

= Nff(Y) 

and 

H(ei, • • • ,e N |- • • ,e and • • • ,y/v) ^ Nif(ei) = Mi(e). 

Now, equality in this last step holds iff e\, • • • ,e/v,/(« • • ,eo and • • • ,y^) 
are mutually independent, / is any measurable function of the variables 
indicated. (Equality in the first step requires thatyi, • • • ,y^ be condi- 
tionally independent of • • • ,eo given • • • ,yo> but we will not need 
this.) 

For real valued variable u, let us define 

, . u if a = 0, 
w («) = , 

1 — u if a = 1; 

we put also u<«><« = [«<«>]<« = [u^] 1 "' for all a,/3eA; note that u<°><°> = 
u (1)(1) = u, i/WXD = u< l ><°> = 1 - a. From 

x; = y; e e; 

= y j e j + {l-y j )a-ej) 
- y y» e y»)(o) +y u) c jiMo) 
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and 

l-xj= yj e (ej e l) 

= yjil - ej) + (1 - yj)ej 
= y CO) e (0)(i) + y Ji) e U)(i), 

it is apparent that 

xf = Zyfef M t 

where a,/3 are variables in the set A. Multiplying these equations together 
for 1 ^ j ' ^ N gives 

4«D . . . x <$n) = £ . . . £ y pi> . . . y <fiN) e [«m) . . . etyOVM 

Hi 0N 

for each of the 2 N choices for ct\, • • • ,a^. 

UH(Y) = 1- h(e), then e h ■ • • ,e M [yf ° • ■ ■ y^] are mutually in- 
dependent for each choice of the jft's. Since £{e} 7) } = c (7) , 1 ^ j' ^ iV, we 
find 

= £ • • ■ £ e (ai)Wl) • • • ^«N)(fiN)E\y^ • • • yW*\ all a's. 

Using now the assumption c ^ 1/2, let c be the number c = — e/(l — 2c), 
so that c (1) = (1 - 0/(1 - 2e). From 

E c<«><?> = l, E cww^w = s 7> , 

we obtain 

— = y . . . y r («i)(7i) . . . r («A/)(-yJv) y — 

" (VI «/v ^ 

= V . . . £ £ . . . y C («l)(Yl) £ (ai)(/3l) . . . c (ttA/)(7JV) e («N)(0N) 

= -E{y{ Yl) ---y^ ) }, all 7 's. 

If this holds for all N s 1, then the \y n : — » < n < °°} are independ- 
ent identically distributed random variables with distribution P\y n = 
0j = P\y n = 1} = 1/2- The entropy rate of this process is H( Y) = 1 ^ 
l-^(e). D 
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II. THE CAUSAL BOUND 

We now consider the case where each e n depends only on the present 
and past values of the x's. That is, e n = rj(- • • ^c n -ipc n )i _0 ° < n < °°» for 
7] a measurable function of the variables indicated. The relation between 
Z and X is thus bicausal: z n depends only on • • • jc n and x n depends only 
on • • • ,z n . It follows that conditionals given • • • ,z n agree w.p.l with 
conditionals given • • • ,x n . 

Theorem 2: If the dependence of the error process EonXis causal, then 
H(Y)^l-2e,0^e^l/2. 

Proof: Setting A variant form of the basic inequality is H(Y) ^ 1 - 
H(eo\' • • ,x-i), obtained from 

l = H(Z)=H[{y ,e )\-",(y-i,e-i)] 

= H[e \- • • (y-i,e-i)] + H\y \- • • ,(y-i,e-i) and e ] 

zH(e \-.>,x-i) + H(yo\- ••#-!); 

we have used only that y n © e n is less informative than (y n ,e n ), 
— » < n ^ — 1. The assumption that r/ is causal is not involved. 

Let us partition the space of sample sequences x into the four disjoint 
subsets: 

Ai = {x: v(- • ■ ,*-i;0;*i, • • •) = and v(- • ■ ^-i;l;*i, • • •) = Oj 

A 2 = \x: y(- • • ,x_i;0;xi, • • •) = 1 and v(' • • ,x-i;Ux u • • •) = 1} 

A 3 = \x: v(- • - ,*-i;0;xi, • • •) = and y(- • • ,x-i;l;x u •••) = !] 

A 4 = \x: ri(- • • ,x_i;0;xi, • • •) = 1 and t/(- • • ,x_i;l;xi, • • •) = 0}. 

The random variable k(x) is defined as the part namer for this partition; 
i.e., k(x) = j iffx e Aj, 1 </ £ 4. Since k(x) depends only on coordinates 
• • • ,x~i and *!,••« of x, the conditional distribution of xq given k is 
P\xq = 0|x} = P\xq = 1\k\ = 1/2 w.p.l. The resulting random conditional 
entropies of eo.yo are seen to be 

h(e \- • • ,x_i and x\, • • •) = h{e \i<) 

[Ofor* = 1,2 
1 for k = 3,4 

h(yo\' "jc-i and *i, •••) = /i(yo|«) 

1 for /c = 1,2 
for k = 3,4. 

Putting a, = P(A,},1 ^ t '^ 4, the average conditional entropies are 
then 
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H(e \' • • ,x-i and x h •••) = H(e \i() = a 3 + a 4 
H(yo\' • • ,x-i and x h • • •) = H (y |/c) - «i + a 2 . 
The error rate is 



so we have 



€ = P{e = 1} = - (a 3 + a 4 ) + a 2 , 



#(eo|- • • ,x_i and xi, • • •) = 2e - 2a 2 ^ 2e. 



Assume now that E depends causally on X; then, e is conditionally 
independent of x u • • • given • • • ,x-i, implying #(e |- ■ • ,*-i and Xi, • • •) 
= H(e \- • ■ ,x_i). Combining the inequalities, we obtain H(Y) ^ 1 — 2f. 
n This bound is strictly above the ideal bound when < « < 1/2, since 
/i(e) > 2e on this interval. 



III. EXAMPLE 

The following example is mentioned in Ref. 4, but a solution is not 
given. Let the errors be e n = v(x n -i;x n ;x n+ i), -« < n < °°, with rj the 
function 

t?U;0;1) = 1 

ri(x-i;xo;xi) = if x-iXo*! 5^ 101. 

The error rate is P\e n = 1) = 1/8. We will compute H(Y) and H (E) ex- 
plicitly and compare H(Y) with the bounds of Sections I and II. 

A graphical representation of rj is given in Fig. 2. The vertices of the 
directed graph are the state pairs x-ix , the arrows represent the tran- 
sitions from x-iXo to xoX\, and the value v(x-i;x ;xi) is shown on the 
arrow from x-\x to xoXi. The corresponding graph of y = 
x v(x-i;x ;xi) appears in Fig. 3. 

We now compute H(Y). Examination of Fig. 3 reveals that process 
Y is a renewal process, with renewal at the beginning of each run, either 




Fig. 2 — eo = 77(x_i,io.'^i)- Values of eo as function of the transition. 
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Fig. 3 — yo ■ *0 © tj(x-i,xo;xi). Values of yo as function of the transition. 

of O's or of l's. Moreover, the length R {0) of a 0-run has the geometric 
distribution 

P\R {0) =J\ = ^r v ; = 2,3,.... 

The mean and entropy of this distribution are easily found to be E\R {0) \ 

= 3,//(fl< 0) ) = 2. 

The 1-runs of Y involve the subgraph shown in Fig. 4, relabelled for 
convenience. A 1-run results from a path (driven by x) which starts at 
A and follows lettered arrows until exit occurs at B along the dotted 
arrow. If the length R {1) of the run has value R {1) = j, the driving x's have 
probability 2~ ( -' +1) per path, so 



(i) = i\ = -ZL 

2J + 



P|« (1) = ;| 



7, y = i,2,--- 



where vj is the number of paths of length ; from A to B along lettered 

arrows. 

For ; ^ 4, we classify the paths of length j from A to B according to 
the earliest appearance of arrow a: 

(i) One path c(d)j- 2 e which does not contain a. 
(ii) Paths which start ba • • ■. 
(Hi) For each ^ k ^ j - 4, paths which start c(d) k ea 

In (ii) the continuations "• • •" are just the paths from A to B of length 




O* 



Fig. 4 — Subgraph for 1-runs of process Y. 
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; — 2, one each, and in (Hi) the lengths of the continuations are j — 
(k + 3) for each ^ k 5j j — 4. In consequence, 

t 4 

*7 " 1 + ">-2 + L Vj-lk+3), J ^ 4. 

The initial terms for the recursion are i»i =» 1, »2 m 1»"3 ■ 2, clearly, and 
it is convenient to define i/ = 0. Then from 

vj ;= 1 + p>_2 + iy-a + • • • + vo, J ' ^ 2, 
•V—i ■ 1 + »j-3 + "•+ vo, 7^3, 

it is apparent that 

"; = ff-i + "7-2, ; ^ 3. 

That is, i»i,i»2, • • • is the Fibonacci sequence 1,1,2,3,5,8, 

The generating function for the Fibonacci sequence is 2f vjx> = 
x/(l - x - x 2 ), as is well known, so the distribution of R {1) has generating 
function 

Taking (d/dx) x =i, we obtain 2£{i2 (1) } = 5 for the mean length of 1-runs 
in the Y process. For numerical evaluation of the entropy H(R ll) ) of the 
R (1) distribution, we obtain the r, = P\R {1) = j\, j ^ 1, from the recur- 
sion 

1 1 

1 1 

4 8 

The numerical result is 

H(R^) = fo-log 2 - 

= 3.593946 bits per run. 

Starting at the beginning of a run, suppose y is truncated after M runs 
of both kinds have occurred. The total number of coordinates y n is the 
sum SfK" + Rm] of M independent samples each ofR<®jH l \ The total 
random entropy is the corresponding sum 2f [Mi?)"') + h(R { J, ) )] for the 
samples. Omitting the detailed arguments, we obtain from the strong 
law of large numbers 

434 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 



L[h(R^) + h(R^)} 
H(Y) = lim w.p.l 

M — => M 

EM' + flS 1 ] 
i 

H(fl<°>) + //(fl < 1 ») 
= E{fl<°>} + £{/?<»} 

= 0.699243 bit per letter. 



As a check, note 



■Elfl (1) j _5 



which is clear from Fig. 3. The entropy of y is H(y ) = h (3/8) = 0.954434, 
and the difference 

h(3/8)-H(Y) = H(y )-H(y \---,y- l ) 

= 0.255191 bit per letter 
is the amount by which Y fails to be a Bernoulli process. The ideal bound 
is 

H(Y)^1 -Ml/8) 

= 0.456436 bit per letter. 

The bound of Section II is easily worked out to be 

H(Y)^l-H(e \---,x- l ) 

= 1 - (l/2)/i(l/4) 

= 0.594361 bit per letter. 

The entropy rate H(E) of the errors can also be obtained from run- 
length considerations. Indeed, \e n = 1} is just the event \x n = is a 0-run 
of length 1} in process X. The 0-run lengths S {0) and the 1-run lengths 
S (1) in process X each have the geometric distribution 

pj S (0) =; -| = p| S <l) =; -j = l ; = 1,2,..., 

as is well known. Let the run lengths after an occurrence of (S (0) = 1} be 
8f\ S{°\ S$\ S[ 0) , • • • , and let random variable J be the smallest v ^ 1 
for which S™ = 1. Since P\S«» = 1} = P[S^ 0) > 1} = 1/2, we again have 
(1/2, 1/2) Bernoulli trials, i.e., 

P\J = j\ = j-j, J-1A — . 
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The number of intervening x n 's is the 0-run length V (0 > = S[ i] + S\ 0) + 

h SS°li+ Sj ] in the E process. The generating function for each S (1) 

is 

ZxJP\S^=j\ = -^-, 
i 2 - x 

and the generating function for S (0) conditional on S (0) > 1 is 

t x>P|S<°> = j\S«» > 1} = -^-, 
2 2 -x 

so we have 

i x *p| v<o> - *| - £ 1 /-i-V/^Ly- 1 

i ;=i 2' \2 - x/ \2 - x/ 

x(2-x) 



- 8x + 2x 2 - x 3 



, x < 1< 1.13968. 



Taking (d/dx) x =\ gives E\V (0) ) = 7, and since the 1-runs in E have length 
V (1) = 1 w.p.l, we have the check 

E\VW\ 1 

PK = i| = 



E{VM\ + E\VM] 8" 
For numerical evaluation of H ( V (0) ), we use the recurrence 

u* = ua-1 --Vk-2 + -Vk-a, k^4; 

1 1 1 

pi-j U.-J, o.- 5 

satisfied by u* = P{ V (0) = &}, £ £ 1. The numerical result is 
H(V<°)) = f^log 2 - 

1 Vk 



= 4.061168 bits per run, 



giving 



H(V< 0) ) + H(V {1) ) 1 
//(E) = £}A1 — v — - = - //( V<°>) 

= 0.507646 bit per letter 

as the entropy rate of the errors. The entropy of e being H(e ) = h(l/8) 
= 0.543564 bit per letter, the difference 
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Me) - H(E) = H(e ) ~ H(e \- ■ • ,«-i) 
= /(e ,|---,e- 1 }) 

= 0.035918 bit per letter 
is the amount by which E fails to be Bernoulli. 
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